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PREFACE 


In  this  Memorandum  the  author  discusses  some  aspects 
and  directions  of  research  in  the  theory  of  dynamic  pro¬ 
gramming,  which  is  an  important  tool  in  the  study  of 
multistage  decision  processes. 


SUMMARY 


In  this  paper,  the  author  indicates  areas  of  research 
and  some  interesting  and  significant  problems  which  arise 
in  the  attempt  to  use  a  certain  functional  equation  as  an 
effective  algorithm  in  the  study  of  multistage  decision 


processes . 
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SOME  DIRECTIONS  OF  RESEARCH  IN  DYNAMIC  PROGRAMMING 

1 .  INTRODUCTION 

Dynamic  programming  is  a  mathematical  theory  designed 
to  implement  (l)  the  study  of  multistage  decision  processes, 
and  (2)  the  solution  of  problems  which  can  be  interpreted 
as  arising  from  such  processes.  In  a  number  of  articles, 
and  particularly  in  four  books  [ 1—4 ] ,  we  have  explored 
some  of  the  conceptual,  analytic,  and  computational 
aspects  of  this  new  theory,  and  presented  a  variety  of 
applications  in  engineering,  economics,  operations 
research,  mathematical  physics,  and  even  in  mathematics 
itself. 

In  the  pages  that  follow,  we  wish  to  indicate  some 
of  the  interesting,  formidable,  and  significant  problems 
that  arise  as  soon  as  we  attempt  to  use  a  characteristic 
functional  equation  such  as 

(1.1)  fn+1(p)  =  max  f s(P>q)  +  fn(T(p,q))] 

q 

as  an  effective  algorithm  for  the  numerical  solution  of 
questions  in  the  areas  named  above. 

As  the  reader  will  see,  very  little  has  been  done 
so  far,  and  there  Is  unlimited  opportunity  for  research 
in  these  new  fields. 


—2— 


2 .  DIMENSIONALITY  DIFFICULTIES 

If  p  is  a  point  in  an  iv-dlmensional  space, 

P  =  (p^,P2>  •  •  •  j  we  face  the  problem  of  storing 
three  functions  of  N  variables,  fn(p),  and 

q^p),  the  maximizing  value  of  q  in  (1.1),  when  we 
turn  to  this  formula  as  a  computational  algorithm. 
Proceeding  in  a  direct  fashion,  which  is  to  say  storing 
a  function  as  the  set  of  values  it  assumes,  we  see  that 
if  each  component  p^  is  allowed  k  different  values, 
then  a  total  of  3  x  kN  values  must  be  stored  to 
determine  the  functions  fn(p)  in  sequence,  starting 
with 

(2.1)  f2(p)  =  max  g(p,q). 

q 

It  follows  that  If  computing  time  is  a  factor,  we 
cannot  allow  large  values  of  k  combined  with  values 
of  N  >  3-  Rapid  access  storage  at  the  moment  is 
bounded  by  32  x  10  words;  with  various  simple  devices 
we  can  push  this  to  about  64  x  10  or  at  most  10  . 

Within  the  near  future,  we  can  contemplate  rapid- 
access  storage  of  10^,  and  probably  within  25  years, 
following  the  usual  curves  of  technological  progress, 
we  will  have  an  upper  limit  of  10^,  with  speeds  about 
lO'5  or  10°  faster  than  those  current  because  of  the  uses 
of  solid-state  devices,  miniaturization,  and  parallel 
circuitry. 


■3- 


Although  these  capabilities  will  trivialize  many 
current  problems,  even  these  fabulous  figures  will  not 
permit  a  routine  approach  to  many  other  outstanding 
problems.  Consider  a  situation  where  each  component 
p^  is  allowed  to  assume  100  different  values.  If 

O 

N  =  -4,  this  leads  to  a  total  of  3  x  10°  values;  if 
N  =  6,  a  total  of  3  x  lO^"2. 

The  problem  of  storage  of  state  variables  is  seen 
in  even  starker  form,  if  we  think  in  terms  of  processes 
with  distributed  parameters  where  the  state  variables 
are  functions,  or  in  terms  of  adaptive  processes  where 
the  state  variables  are  probability  distributions. 

It  follows  that  it  is  essential,  as  in  so  many 
areas  of  classical  analysis  and  mathematical  physics, 
to  think  in  terms  of  approximations.  In  the  sections 
that  follow,  we  shall  discuss  various  types  of  approxi¬ 
mate  techniques  and  Indicate  the  many  new  mathematical 
problems  that  arise  In  this  fashion. 

3.  APPROXIMATION  IN  FUNCTION  SPACE 

The  standard  initial  approach  is  to  Isolate  those 
processes  which  possess  simple  analytic  expressions  for 
their  optimal  policies  and  return  functions.  An 
Important  class  of  processes  of  this  nature  are  those 
where  g(p,q)  is  quadratic  in  Its  arguments  and 
T(p,q)  is  linear  in  p  and  q;  see  the  work  in  [1-4] 
and  [5—9]*  Othei’  important  classes  of  processes 


-4- 


exist,  however;  for  example,  see  [  10 ] ,  and  the  work  on 
"optimal  inventory"  equations  presented  in  [l]  and  [j],  and 
the  many  results  obtained  in  [ 11 ] . 

A  fruitful  approach  would  seem  to  be  the  study  of 
the  "inverse  problem,"  i.e,,  given  the  optimal  policy, 
determine  all  admissible  return  functions  g(p,q)  and 
transformations  T(p,q),  which  lead  to  this  policy. 

Some  preliminary  work  has  been  done  in  [ 12 ]  and  [13]. 

Another  classical  direction  is  that  of  power  series 
expansions  in  terms  of  state  variables,  time,  or 
parameters  appearing  in  the  equation.  Some  preliminary 
results  in  connection  with  perturbation  theory  are  given 
in  [  14  ]  . 

A  most  important  new  direction,  based  upon  the 
explicit  solutions  mentioned  above,  is  that  of  quasi- 
linearization  [15,16.17].  This  theory  is  strongly 
connected  with  the  concept  of  approximation  in  policy 
space,  which  we  shall  discuss  below,  but  has  its  roots  in 
classical  analysis,  particularly  in  functional 
Inequalities  (see  [18],  where  reference  is  given  to 
early  work  of  Caplygin) . 

Finally,  let  us  mention  that  the  funtional— 
equation  technique  of  dynamic  programming  applied  to 
physical  processes  has  produced  the  theory  of  invariant 
imbedding  [19, 20,21]. 


4.  APPROXIMATION  IN  POLICY  SPACE 


In  classical  analysis,  there  exists  only  the 
technique  of  approximation  in  function  space.  Given  an 
equation  for  an  unknown  function  such  as 

(4.1)  f  =  T(f), 

we  generate  a  sequence  {fn)  by  weans  of  the  relation 

(4.2)  f„tl  =  T(fn). 

Under  suitable  conditions,  this  sequence  converges  to  a 
solution  of  (4.1). 

Presented  with  the  functional  equation 

(4.3)  f  =  max  T(f,q), 

q 

we  can  proceed  as  above.  However,  dynamic  programming 
offers  a  new  mode  of  approximation:  approximation  in 
policy  space.  In  place  of  concentrating  directly  upon 
the  return  function  f,  we  can  focus  upon  the  policy 
function  q. 

An  advantage,  of  both  analytic  and  computational 
significance,  is  that  approximation  in  policy  space 
automatically  yields  monotone  convergence. 

In  this  area,  digital  computers  and  mathematical 
experimentation  can  be  extremely  valuable.  As  far  as 
the  operational  solution  of  control  processes  in  the 
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engineering  and  industrial  worlds  is  concerned,  we 
know  that  there  is  a  great  premium  on  policies  of  simple 
type.  The  computational  testing  of  large  classes  of 
simple  policies  in  various  types  of  multistage  decision 
processes  would  yield  valuable  information  concerning 
the  need  for  more  exact  solutions  and  would  furnish 
useful  clues  for  further  research. 

Many  years  of  application  of  Rayleigh— Ritz— Galerkin 
type  approximation  procedures  teach  us  that  most 
functionals  are  fairly  "flat."  It  would  be  important  to 
make  this  more  precise  as  far  as  optimization  processes 
are  concerned. 

A  new  type  of  approximation  process  called 
"stochastic . approximation"  [22],  [23]  will  play  an 
important  role  in  multistage  decision  processes.  In  any 
case,  it  Is  clear  that  there  are  many  new  ideas  to  be 
developed  in  these  areas.  Ideas  quite  different  from 
those  of  classical  analysis. 

5.  APPROXIMATION  IN  INFORMATION  PATTERN 

Quite  closely  connected  with  the  concept  of 
approximation  in  policy  space  Is  the  problem  of  the 
value  of  information  concerning  the  state  of  the  system. 
Suppose  that  we  are  given  the  values  of  only  k  of  the 
components  of  p,  or,  more  generally,  k  functions  of 
the  N  components  of  p,  how  well  can  we  make 
decisions?  As  k  -*  N,  how  rapidly  does  the  optimal 


return  under  partial  information  approach  the  optimal  * 
return  from  full  information? 

Put  another  way,  how  much  does  lack  of  information 

cost? 

It  is  clear  that  there  are  many  interesting  classes 
of  problems  contained  in  this  general  format.  What  is 
currently  called  "Information  Theory"  is,  as  is  shown 
in  [ 24] ,  only  a  very  specialized  version  of  a 
particular  problem  of  this  general  type;  see  also  [25] . 

Once  again,  mathematical  experimentation  with 
digital  computers  will  be  extremely  helpful  in  guiding 
further  research. 

6.  APPROXIMATION  IN  STRUCTURE  SPACE 

The  fundamental  objective  of  mathematical  physics 
is  the  accurate  approximation  of  the  complex  structures 
of  reality  by  mathematical  structures  of  simpler  nature 
which  are  amenable  to  the  analytic  and  computational 
techniques  we  currently  possess.  Eynamic  programming 
and  invariant  imbedding  represent  new  mathematical 
structures,  differing  in  many  ways  from  the  classical 
approaches,  designed  to  take  advantage  of  the 
properties  of  the  digital  computer.  There  must  be  many 
such  new  approaches  waiting  for  the  discerning  eye. 

At  the  present  time,  there  exists  little  work 
devoted  to  the  structure  of  mathematical  processes,  and, 
in  particular,  to  the  concept  of  the  degree  of 


approximation  of  one  process  by  another.  Considering  how- 
fruitful  the  problem  of  the  approximation  of  a  given 
function  by  a  polynomial  has  been,  it  is  clear  that  many 
significant  problems  exist  in  this  area. 

7.  DEFERENTIAL  APPROXIMATION 

As  a  step  in  the  preceding  direction,  let  us 
consider  the  technique  of  differential  approximation 
[26].  The  approximation  of  a  function  f(t)  on  the 
Interval  —  oo<a«£t<b<oo  by  a  polynomial 

(7.1)  P_(t)  »  2  sl  tk 

n  k=0  K 


is  equivalent  to  the  approximation  of  f(t)  by  means 
of  a  solution  of  the  linear  differential  equation 


(7-2) 


dn+1u 


dt 


iff 


0. 


Similarly,  to  approximate  to  f(t)  by  means  of  an 
exponential  polynomial  of  the  form 

n  X.  t 

(7.3)  Pn(t)  =  Z  a,  e  K 
n  k=0 


is  to  approximate  to  f(t)  by  means  of  a  solution  of  the 
general  linear  differential  equation  with  constant 


coefficients  : 
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(7.4) 


d^r 


dn+1u  .  ,  _ 

dtn+1  1  dtn 


+  b  u  =  0. 
n 


In  place  of  the  usual  approach  to  approximation 
problems  of  this  nature,  let  us  proceed  in  the  following 
way.  Determine  the  coefficients  b^  in  (7*4)  by  the 
condition  that  the  quadratic  form 


(7-5) 


+  bnu'')  dt 


is  minimized.  If  u  is  determined  as  the  solution  of 
a  nonlinear  differential  equation 


(7-6) 


dku 

dt^ 


dk_1u  N 


the  numerical  work  is  easily  carried  out;  see  [26]  . 

Problems  that  arise  are  those  of  degree  of 
approximation  and  type  of  approximation  to  use.  In  the 
approach  sketched  above,  2n  +  2  parameters  are 
required  to  determine  u(t),  the  n  +  1  ~coef ficients 
b^  and  n  +  1  initial  values.  Can  we  obtain  superior 
approximation  by  a  different  allocation  of  parameters? 
For  example,  might  it  be  better  to  use  a  nonlinear 
differential  of  lower  order,  or  variable  coefficients? 
Once  again,  some  mathematical  experimentation  will 
yield  suggestions  concerning  further  research. 
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3 .  SEARCH  PROCESSES 

So  far,  we  have  concentrated  upon  the  approximation 
to  the  return  function  or  the  policy  function  as  a 
fundamental  problem  to  be  overcome  in  the  use  of  (l.l). 
An  equally  important  problem  as  far  as  cutting  down  on 
computing  time  is  concerned  is  that  of  obtaining  the 
maximum  over  q.  If  a  is  multidimensional,  straight¬ 
forward  enumerative  search  will  consume  far  too  much 
time. 

Hie  basic  problem  is  that  of  utilizing  the 
structural  features  of  the  process  to  accelerate  the 
search  process.  As  such,  we  see  that  we  are  verging 
upon  pattern  recognition  processes,  and,  indeed,  the 
two  are  closely  related. 

A  brief  discussion  of  some  recent  work  is 
contained  in  [  3^  •  In  general,  however,  little  has  been 
done  In  this  new  field,  and  there  is  ample  room  for 
extensive  research. 

9.  SCHEDULING  PROBLEMS  AND  COMPUTERS 

Ideally,  what  is  desired  is  a  self— organizing 
computer  which  will  rearrange  its  components  and 
scheduling  so  as  to  solve  a  particular  problem  in  a  most 
efficient  fashion.  Some  work  has  been  done  in  this 
connection  (see  [27]). 


Leaving  aside  an  optimal  reorganization,  even  a 
parallel— operation  computer  would  be  an  enormous  advance. 
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In  any  case,  vast  opportunities  exist  in  these  directions 
to  bring  about  orders— of— magnitude  reduction  in  computing 
time.  An  advance  of  this  type  would  be  equivalent  to  a 
mathematical  advance  of  an  order  of  magnitude. 

10.  CONCLUSION 

•a 

We  have  attempted  to  discuss  areas  of  research, 
rather  than  particular  problems.  The  reader  interested 
in  specific  questions  may  refer  to  the  boohs  [1-4]. 
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